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Abstract 



Many matrices that arise in the solution of signal processing problems have a special 
displacement structure. For example, adaptive filtering and direction-of-arrival estimation 

_C ■ yield matrices of Toeplitz type. A recent method of Gohberg, Kailath and Olshevsky (GKO) 

allows fast Gaussian elimination with partial pivoting for such structured matrices. In this 
paper, a rounding error analysis is performed on the Cauchy and Toeplitz variants of the 
GKO method. It is shown the error growth depends on the growth in certain auxiliary vec- 
tors, the generators, which are computed by the GKO algorithms. It is also shown that in 
certain circumstances, the growth in the generators can be large, and so the error growth is 
much larger than would be encountered with normal Gaussian elimination with partial piv- 
oting. A modification of the algorithm to perform a type of row-column pivoting is proposed 

\& • which may ameliorate this problem. 

q 

l/~) ■ Keywords: Structured matrices, fast algorithms, displacement rank, generators, pivoting, 

error analysis, stability 
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1 Introduction 

Many problems which occur in signal processing, control theory and interpolation lead to a 



square or rectangular system with a special structure, for which the exact or least-squares 
solution is required. For example, adaptive filtering requires either the exact solution of a square 
Toeplitz system or the least-squares solution of a rectangular Toeplitz system. A Toeplitz matrix 
is one whose entries along the NW to SE diagonals are constant, i.e. element iy depends only 
on i — j. Other types of structured matrices which arise are Hankel matrices whose entries 
along the SW to NE diagonals are constant, Vandermonde matrices whose entries have the form 
Vij = x\~ , and Cauchy matrices whose entries have the form ctj = l/(tj — Sj), where the U and 
Sj are the elements of vectors t and s. 

Normally, the exact or least-squares solution of a linear system requires 0(n 3 ) operations 
to solve, where n is the order of the system. However, the structure of the systems mentioned 
above has been exploited in the past [1, 4, 7] to derive fast solvers, i.e. those that require 0(n 2 ) 
or fewer operations. These fast algorithms are in general numerically unstable for indefinite 
systems [3, 5, 11]. Recently, methods have been proposed [6, 10, 11] which are numerically 
stable, but which attempt to retain the 0(n 2 ) complexity. However, all of these algorithms will 
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require 0(rft) operations in the worst case. The BBH method [2] requires 0(n 2 ) operations in 
the worst case and can be shown to be weakly stable, but not stable in the usual sense of backward 
error analysis. Thus there is an interest in fast algorithms which require 0(n 2 ) operations in 
the worst case and can be shown to be stable. 

Recently, Gohberg, Kailath and Olshevsky [8] have shown how to perform Gaussian elimi- 
nation in a fast way with matrices with a special displacement structure. Such matrices include 
Toeplitz, Vandermonde, Hankel and Cauchy matrices, and generalizations thereof, called Toeplitz 
type, etc. They also show how to incorporate partial pivoting into the Cauchy and Vandermonde 
solvers. They point out that although pivoting cannot be incorporated directly into the corre- 
sponding Toeplitz or Hankel solvers, the Toeplitz and Hankel problems can be transformed by 
simple orthogonal operations into Cauchy problems. The solution to the original systems can 
be recovered from those of the transformed systems by the reverse orthogonal operations. Thus 
fast Gaussian elimination with partial pivoting can be carried out on Toeplitz, Vandermonde, 
Hankel and Cauchy systems. 

It might be assumed that such fast solvers should have the same stability properties as 
Gaussian elimination with partial pivoting. One of the aims of this paper is analyse the error 
behaviour of these algorithms by means of a backward error analysis. It is shown that error 
propagation depends on the magnitude of both the triangular factors L and U (as in Gaussian 
elimination) and the generators, auxiliary vectors which are computed during the course of the 
algorithm. 

It is shown that in some cases the generators can suffer a large growth and cause a corre- 
sponding growth in the backward and forward error. A modification is proposed which may 
prevent this growth, and so restore the stability of the algorithm in these cases. However, we 
can not prove that the modification is always successful. 

The paper is structured as follows. In §2, the Gohberg-Kailath-Olshevsky (GKO) algorithm 
for Cauchy and Toeplitz matrices is briefly described. The error analyses of the Cauchy and 
Toeplitz variants of the GKO algorithm are carried out in §3 and §4 respectively. In §5, examples 
for both variants are given where a large growth occurs in the generators and hence in the errors 
in the solutions. The modified version of the GKO algorithm is proposed in §6, and numerical 
tests of this are carried out there. Some conclusions are drawn and suggestions for future work 
are given in §7. 

Notation. The following notation is used, e is the machine epsilon, and n is the order of 
the matrix to be factorized. Scalars of the form q and ki are small constants. &j denotes the 
jth column of the identity matrix. Elementwise matrix multiplication is denoted by the centred 
circle o. For a matrix A, \A\ is the matrix of moduli of the {aij}, A denotes elementwise 
inversion, and A' denotes augmentation of A to order n by adding zero rows and zero columns 
respectively above and to the left of A. Other submatrices are indicated in MATLAB style, i.e. 
for a matrix A, A p -n r:s selects rows p to q of columns r to s, and a colon without an index range 
selects all of the rows or columns. 



2 The Gohberg-Kailath-Olshevsky (GKO) Algorithm 

In this section, we first define the displacement operator, displacement equation and displace- 
ment rank for structured matrices; we then give the general Gaussian elimination algorithm for 
structured matrices, followed by the variants for Cauchy and Toeplitz matrices. 

2.1 Displacement structure 

Gohberg et al [8] show that structured matrices satisfy a Sylvester equation which has the form 

V {Af , Ab} (R) = A f R-RA b = $V, (1) 

where Af and A b have some simple structure (usually banded, with 3 or fewer full diagonals), <£ 
and ^ are n x a and axn respectively, and a is some small integer (usually 4 or less). The pair 
of matrices <J>, ^ is called the {A f , A b }- generator of R, and a is called the {Af, A b }- displacement 
rank of R. 

Particular choices of Af and Af, lead to definitions of basic classes of matrices. Thus, for a 
Cauchy matrix 
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we have 
and 



Af = D t = di&g(t 1 ,t 2 ,...,t n ), A b = D s = diag(si,s 2 
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More general matrices, where Af and A b are as in (2) but $ and \l/ are general rank-a matrices, 
are called Cauchy-type. 



Similarly, for a Toeplitz matrix T = [ij 



A f = Z 1 
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2.2 Gaussian elimination for structured matrices 

Let the input matrix, R\, have the partitioning R\ 

Gaussian elimination is to premultiply R± by 



Yi Ri 



1 T 

-yiM i 



. The first step of normal 

, which reduces i?i to x 

tii 



where R2 = R\ — Yi^J/di is the Schur complement of d\ in R±. At this stage, i?i has the 
factorization 



R 
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d\ wf 




yiM 1 
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Schur complement R2 = 


C?2 Wj 
Y2 ^2 



eventually yielding 



w 



-T] 



a factorization i?i = LU, where column k of L is [0 r 1 y]F ] T , and row /c of U is [0 T 1 w fe 

The genesis of structured Gaussian elimination is the fact that the displacement structure is 
preserved under Schur complementation, and that the generators for the Schur complement Rk+i 
can be computed from the generators of Rk in 0{n) operations. This is expressed constructively 
in the following theorem, which is proved in [8]. 



Theorem 2.1 Let matrix R\ 



d\ wj 



where $' 1 ' = [cp\ 
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satisfy the Sylvester equation 
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Sylvester equation 

^ 7 {A/,2,A 6j2 }(-^2) = Af >2 R2 - R2A b)2 

where Afp and A^ are respectively Af t i and A^i with their first rows and first columns deleted, 

and where & 2 > 

by 
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Equations (7) and (8) form the basis of the following general structured Gaussian elimination 
algorithm. 

Algorithm 2.1 (Structured Gaussian elimination) 

1. Recover from the generator <I>( 1 ), Vl/W the first row and column of R\ 



Wi 



Yl -"22 

2. [1 yf /di] T and [di w^] are respectively the first column and row of L\ and U\ in the 
LU factorization of R±. 

3. Compute by equations (7) and (8), the generator <£>( 2 )^(2) f or ^g g c h ur complement i?2- 

4. Proceed recursively with &( 2 > and \p( 2 ). Each major step yields [1 y^ /dfc] T and [d& wj], 
which are respectively the first column and row of L^ and Uk in the LU factorization of Rk- 
Column k of L and row k of U are respectively [Ojf_! 1 yjF/dfc] T and [0^_ x d& wj]. 

Pivoting. Gaussian elimination without pivoting is unstable in general. One normally uses 
partial pivoting (swapping rows to bring the largest element in the first column to the pivot 
position) or complete pivoting (swapping rows and columns to bring the largest element in the 
whole matrix to the pivot position) to improve the accuracy. Row and/or column interchanges 
can destroy the structure of certain matrices, such as Toeplitz matrices. However, if Afi in (6) 
is diagonal (which is the case for Cauchy and Vandermonde type matrices), then the structure 
is preserved under row permutations. 



Partial pivoting can also be incorporated into structured Gaussian elimination. Suppose we 
wish to swap rows 1 and q of R±. Let P\ be the matrix which applies this permutation. Then it 
is easy to see that P\R\ satisfies (6) with the (1, 1) and (q, q) entries of Af t ± swapped, and with 

swapped row vectors ip\ and ip q . Thus, pivoting can be incorporated into Algorithm 2.1 by- 
adding the following steps: 

0.5 Initialization step. Set permutation matrix P = I. 

2.5 After step 2 of Algorithm 2.1. Let (yi) 9 be the largest entry by magnitude in y 1 . Swap 
rows 1 and q of P and $W, and the first and q-th diagonal entries in Afi. Recover the 
first row of P\R\ from Vp' 1 ' and the swapped <I>' 1 ). 

Note. The computation of the first row of the original w 1 ' in step 1 of Algorithm 2.1 may be 
omitted - we only require the first row of the swapped R^ 1 '. 

It may be seen that the pivoted algorithm computes upper and lower triangular matrices L 
and U which satisfy 

rW = p T LU . 

Note that for Cauchy-type matrices, where both Af\ and A^i are diagonal, both row and 
column pivoting may be performed. However, complete pivoting requires the computation of 
all the entries in the matrix, which would require 0(n 2 ) operations at each step and 0(n 3 ) 
operations in all. It will be seen in §6 that a restricted version of row-column pivoting can be 
used to improve the performance of the GKO algorithm. 

2.3 The Cauchy variant of the GKO algorithm (GKO-Cauchy) 

Recall that a Cauchy-type matrix satisfies the Sylvester equation (6) with 

Af t i = D t = diag(t 1 ,t 2 ,...,t n ) and A b>1 = D s = diag(si, s 2 , . . . , s n ) . 
It can be easily verified that if £« ^ Sj, then the (i,j) entry of R^ 1 ' = R is given by 

_ ifjipj 

There may be some cases where ti = Sj and (piipj = for some (i,j), and r^ cannot be recovered 
from its generator. We do not consider these cases in this paper. 
In general, at major step k, the reduced matrix R( k ' has the form 



R (k) 



d\ vvf 
'•• : 

4-i ^l-i 

••• R k 



The entries of the &-th Schur component Rk, may be computed by 

{Rk)i-k+l,j-k+l 



ffc) = ^ ^ | k<i,j<n (9) 



Equation (9) can be used in Algorithm 2.1 with pivoting to yield the Cauchy version of the 
GKO algorithm. 



Algorithm 2.2 (GKO-Cauchy) 

Input. Cauchy-type matrix R±, specified by t, s, <I>' 1 ' and Vl/' 1 '. 
Output. Factorization R\ = P LU, where P is a permutation, and L and 
U are lower and upper-triangular respectively. 

% Initialization 
7^0; C/-S-0; P <- 7 

for fc <— 1 : n % A;: Iteration number 

for j -(— k : n % recover col.l of P& 

, , (k) , (fc) 

r (k) ^_ <p) >ij 

jk tj—Sk 

end 

% Carry out row interchanges 

Find k < q < n such that \r , \ = maxfc<j< n \ r )k\ 
(fc) (fc) (fc) (fc) 

swap fc-th and q-th. rows of 7 
swap fc-th and g-th rows of P 

for j 4— k + 1 : n % Recover row 1 of swapped Rk 

(k) , (k) 
fcj tk-Sj 

end 

, (fc) 
^fcfc <- r fefc 

% Compute row and col k of L and J7, and update $ and $ using (7) and (8) 
for j <r- k + 1 : n 

/., <_ r W/ r ( fc ) 

hk ^ I jk I ' kk 
U fci <- r fc/ 



Ipj ^ Wj -1p k U kj/Ukk 



(fc+1) , (fc) (fc), 



end 
end 



2.4 The Toeplitz variant of the GKO algorithm (GKO-Toeplitz) 

Recall that a Toeplitz matrix satisfies the Sylvester equation (1), with Af, A^, $ and ^ being 
given by equations (3) to (5), and a Toeplitz-type matrix is one with Af and A\, given by (3), 
and with general low-rank <3? and \l/. The first row and column of the Toeplitz-type matrix can 
be simply generated from <I> and \l/, and this generating formula can be used in Algorithm 2.1 
to yield a structured Gaussian elimination algorithm for Toeplitz-type matrices. 

Because neither Af nor A\, is diagonal, pivoting cannot be introduced directly into this 
structured algorithm - pivoting will destroy the Toeplitz-type property. However, the Toeplitz- 
type matrix can be easily converted, by fast orthogonal transformations, into a Cauchy-type 
matrix which can be factorized as in Algorithm 2.1. The inverse orthogonal transforms yield 
the factorization of the original matrix. The following result of [8] shows how this conversion 
may be done. 



Theorem 2.2 Let T be a Toeplitz-type matrix, satisfying 

v {ZltZ _ l} (T) = nr, 

n = [ujj u% •••w^] T , r = [7i 72 ••• 7„], 

where the {oj{\ and the {7^} are 1 x a and a x 1 respectively. 
Then 

R = FTD^F* (10) 

is a Cauchy-type matrix, satisfying 

V {D F ,D F J = $# , 

where F = -7=[e 2m ( k ~ 1 'V~ 1 >' n ]i<k,j< n is the Discrete Fourier Transform matrix, 

D F = diag(l, e 2ni / n , ..., e 2 ^™" 1 )/™) , D F _ = diag(e^/ ra , e 3ni / n , ..., e^ 2 ™" 1 )/™) , (H) 

D = diag(l, e^ n , ..., e"^" 1 )/") 

and 

<f> = Fn, V* = FDT* . (12) 



Theorem 2.2 allows the generators of T to be converted to the generators of R in 0(2an log n) 
operations via FFTs. R can then be factorized as R = P LU using Algorithm 2.2. Using (10), 
we then obtain 

T = F*P T LUFD . (13) 

From this factorization, a linear system in T can be solved in n 2 + 2nlogn operations, so 
the whole procedure of conversion of Cauchy form, factorization and solution requires 0(n 2 ) 
operations. 

3 Error Analysis of the GKO- Cauchy Algorithm 

In this section, a backward error analysis will be carried out, which yields a bound for the 
perturbation matrix E, defined by 

LU = R + E, (14) 

where R is the matrix to be factorized, and L and U are the computed factors. In the analysis, 
we first derive some preliminary results which apply to any algorithm for structured Gaussian 
elimination (SGE), and indicate a general methodology for error analysis of SGE algorithms. 
We then carry out the analysis for Cauchy-type matrices in general and for the Cauchy-type 
matrix derived from a Toeplitz matrix by equation (10). 

3.1 Preliminary results 

The following two lemmas may be used for the error analysis of SGE algorithms in general, and 
the GKO-Cauchy algorithm in particular. The first lemma shows that if G is the perturbation 
in the Sylvester equation caused by replacing R by LU, then the displacement of E is G. 



Lemma 3.1 Let R be a general structured matrix that satisfies (1), let Af, A b , $ and \I/ be as 
defined above, and let L, U and E be as in (14). Suppose L and U satisfy 

AfLU - LUA b = *$ + G ; (15) 

then E satisfies 

V {Af:Ab} {E) = A f E-EA h = G. (16) 

Proof. From (14) and (15), 

A f (R + E)-{R + E)A b = $^ + G . 
Expanding the above, and using (1) we obtain (16). D 
Corollary 3.2 If R is a Cauchy-type matrix with Af = Dt and Ab = D s , then E satisfies 

D t E -ED S = G (17) 

and 

'-■--^— i,j = l,...,n (18) 



H Sj 



Proof. (17) follows directly from (16), and (18) follows by evaluating each component of 
(17). D 

If R is a Toeplitz-type matrix, Af = Z\, A b = Z_±, and E satisfies Z\E — EZ^\ = G. 
Because Z\ and Z_i are not diagonal, the recovery formula for E is a little more involved, and 
will be derived in the next section (Lemma 4.3). 

The second lemma of this section shows that G is the sum of the local perturbation matrices 
incurred in each step of the relevant structured Gaussian elimination (SGE) algorithm. 

Lemma 3.3 Let V/^. A b \ be the displacement operator as defined in (1); let L, U and G be 

as defined above; let the {&( k > , ^f^ k '}k=i,2,... be the computed generators of the {R' k }k=i,2,..., the 
reduced matrices at step k of SGE, and define <b( n+1 > = vj>( n+1 ) = 0. Then 

n 

G = ]T H k , (19) 

fc=i 

where H k , the local perturbation in each step of SGE, is defined by 

V^^&fciifc:) = *W*W - & k +V¥ k+ V + H k , k = l,...,n. (20) 

Proof. Writing (20) explicitly, we get 

A f i k u k: -i k u k: A b = $W¥Q-& k+ V¥ k+ V + H k ^ k = l,...,n. (21) 

Summing the members of (21), we obtain 

n n n 

^/£l:kUk: -£l:kUkA = ^^ " *( n+1 >*(" +1 ) + £ H k . (22) 

i=l i=l i=l 

Now ELiUuk: = LU, 4>W = $, *( 1 ) = * and 4>( n+1 ) = §( n+1 ) = 0. Substituting these 
identities into (22) and comparing the resulting relation with (15), we obtain (19) D 



3.2 Methodology of error analysis for SGE algorithms 

Lemmas 3.1 and 3.3 may be used in a general methodology for the error analysis of SGE 
algorithms similar to Algorithm 2.1. 

In the following methodology and the subsequent analysis of the GKO algorithm, we now 
let <J?( fc ) and \p( fc ) be the computed values of these quantities, u^., Y k ' nk , Y k , 3>( fe+1 ) and ^( fc+1 ) 
be the values of these quantities computed in exact arithmetic from & k > and Vl/W using steps 1 
to 3 of Algorithm 2.1, and u k: , i k . nk , \ki & k+1 ' and ijA fc+1 ) be the actual computed values of 
Ufc : , v,.',, \. k , cj>( fc+1 ) and $( fc+1 ) respectively. The methodology is as follows: 

1. Using a standard rounding error analysis, derive expressions of the form 

Ufc: = U fe; + 5u k: (23) 

~(fc) _ (k) , r~(fc) ( 0A \ 

r k:n,k ~ r k:n,k + 0Y k:n,k V Z4 J 

Ik = Ik+Slk (25) 

$(k+l) = $(*) _ l k< j,(k) + ^( fc +i) (26) 

§(fc+i) = #-#u jb: /fg + ^* +1 ) (27) 

where <5iifc : , etc. are error terms. 

2. Evaluate $( fc )vl/( fc ) - $( fe + 1 )vjir( fc + 1 ) using (23) to (27). This can be expressed in the form 

$(*)*(*) _ $(*+!)*(*+!) = A/ i :fcflfc: _ l k u k .A b + F fc , (28) 

where F k is an error term. By (21), 

Hk = — Fk ■ 

3. After some manipulation, express Fk as a sum of terms of the form 

S(A f , A b ) o T{VW) o Lfcfifc: o A or S(A f , A h ) o T(V^ k+1 ^) o L : , k+1:n U k+1:nt: o A . 

Here, the S^Aj,^) are matrices formed from Af and Ab, A is a matrix whose elements 
are bounded in magnitude by e, and V^ ' is defined by 

|$0)||^( fc )| = y( k ) o $0)^0) . (29) 

4. Apply (19) to derive an expression for G. 

5. Lemma 3.1 shows that G satisfies 

V {AfAb} E = G . (30) 

Using the appropriate algorithm to recover a structured matrix from its generators, derive 
an expression for E from the expression for G. Note that in general, G will be of full rank. 
However, (30) will still be satisfied by E and G. 

6. Derive bounds for ||ii7|| using some norm. 



3.3 Error analysis of GKO for Cauchy-type matrices 

In this subsection, we use the above methodology to derive the first of our main results — a 
bound for \\E\\ when a Cauchy matrix R is factorized by the GKO algorithm. The results are 
encapsulated in three theorems, which yield expressions for the {H k }, an elementwise bound for 
G, and a bound for \\E\\ respectively. We then discuss the size of the bound for \\E\\. 

Theorem 3.4 Let R be a Cauchy matrix to be factorized by the GKO algorithm and let F k , H k , 

V^ k > , l :k , u k: be as defined above. Then 

F k = Cl A« o D^D p l k u k: + c 2 l k u k .D q D^ o A^ + c^rf^v^ A® o l :fc u fc: + 

c 4 A( 4 ) o b 1 o y( fc +!) o L k+1:n u k+1:n , 

where c\ to C4 are small constants, D{, c = diag(tr fc ), D\, r = <X\a.g{v k . ), D p = diagjtj — s k }i, 
D q = diagjtfe — Sj}j, B = [l/(ti — Sj)] is the ordinary Cauchy matrix with displacement operator 
^{D s ,D t }> the A(' are matrices whose elements are less than e in magnitude, and H k = — F k . 

Proof. In the following, we simplify our notation and drop the superscript (k); where the 
superscript is (k + 1) we indicate this by a prime ('); and we drop the subscripts : k, k : and 
k : n,k. In the following, we do not give all the steps in the derivation of the various expressions, 
as these are straightforward but very tedious. However, we indicate how key intermediate 
expressions are derived. 

We use the normal properties of floating point operations performed with at least one guard 
digit, viz. fl(a) = a{l + 5\) and fl(a-kb) = (a*6)(l + <5 2 ), where fl(a) denotes rounding, fl(a-kb) 
is the computed result of any of the four basic floating-point operations, and |<5i|, j^l < e. 

Following step 1 of the above methodology, we evaluate expressions for the computed values 
of f , 1 and u (subscripts and superscripts dropped), yielding after a few steps 

u = u + 2iiA (1) +(p k V {1) ^D- 1 , (31) 

f = r + 2A (2) f + L> p 1 P (2) ^fc , (32) 

I = l + ZA^n + ^D^V^^k-bkkr^d^UM- 

Here and below the A'') denote diagonal matrices with elements of magnitude less than e; the 
T>(> are elementwise operators which multiply each element of their matrix operands by a factor 
less than e, and the d, are similar elementwise vector operators. 

Similarly, it can be shown that the computed values of $' and ^' satisfy 

$' = ^-X^ + V^ti + V {A) (\d> k ) , 

*' = *-7p k u/r k k + V^* / + 2V ( > 6 \7p k u)/r kk . 

Carrying out step 2 of the above methodology, we obtain 

m - $'§' = $ip k u/rr kk + l^m - l^ fc u/f fcJfc - 2^V^\ip k u)/r k k ~ $'P (5) V - 

V^(^k)* ~ P< 3 >$'tt' + vW(\4> k )4, k u/r kk + 2l<t> k V^^ k u)/hk ■ (33) 

Let T3 denote the first three terms in (33). From Algorithm 2.2, we have $Vfc = D p r and 
(j)^ = uD q . Using these relations in T3, and expressing r in terms of (f - error terms) using 
(32) and u in terms of (u - error terms) using (31), we can show that 

T 3 = D t lu-luD s -3D p A^Hu-2\uA {5) Dg + 2r kk 1 8\u- 

V^ <S>4> k u/r kk - 1<^£>« * + r^d^ cf> k ^ k lu , (34) 

10 



where \5\ < e. By using (34) for the first three terms of (33), we get an equation of the form 
(28), where F k is given by the last six terms in (33) plus the last six terms in (34). Terms 
involving the T>^'> may be expressed in terms of lu or LU by using the definition of V, which in 
the current notation is 

= \4>i\\^j\ 

Consider the factor $p( 6 ) (^u) jf kk in the term — 2^D^'(ip k u)/f kk . We have 

($Z>( 6 >(V»fcU)/f fcfc )v = <t>idf\i>ku 3 )/? kk . 
Recall that <pi = [</>ji)<fe] an d ipj = [ipij,ip2j]- Then 

($X> (6) (^ fe u)/ffejfc)y = (<pilS[/^ik + (t>i2S2j-lp2k)Uj/r kk , 

where 5\a and 5\a are the scaling factors from the operator <9- . From the definition of V, 
using the fact that l{ = r~i k /r kk , this can be shown to be 

(<5>V( 6 \4, k u)/r kk )ij = Qfvikb^Uuj , 
where |<5 4 -- | < maxj = i : 2 \5 k - \ . In matrix form, we obtain 

$P (6) ((/> fc fi)/f fefc = A o diag{v ik /b ik }\u , 

where A and subsequent A(') are matrices with elements bounded in magnitude by e. Similarly, 
all the other terms can be expressed as either 

(i) an elementwise product of A'-* and a normal product of lu and matrices derived from B 
or V, or 

(ii) an elementwise product of the form A'- 1 o B o V' o L :jk +i :n Uk+i :n ,:- 

When this is done, the result follows. D 

The next theorem uses Lemma 3.3 to obtain an elementwise bound for \G\. 

Theorem 3.5 Let H k be as in Theorem 3.4- Then 

\G\ < Cl b m lA^ o \L\\U\ + c 2 b;J m \L\\U\ o A( 2 ) + c 3 6^ n A( 3 ) o \L\dtog{v$}\U\ 

n 

+ C4|5 J | oA< 4 )o£ |# fc | 

k =2 

where b m - m is the minimum modulus of the elements of B, L = [v. fc ]^ =1 ° L, U = U o [vL ]i? =1 , 
and R' k = V (fc) o L. yk:n U k:ny: . 

Proof. G is evaluated by carrying out the summation in (19), and using the identities 
E"=fc a :fe b fc: = AB and £™ =fc x fc a. fc b fc: = Adiag{x k }B. D 

We now apply the last step in the above methodology to derive an expression for ||-E|[. 



11 



Theorem 3.6 Let E be the backward error E = LU — R in the factorization of R using the 
GKO algorithm, let L, U , R, B and V be as above. Then \\E\\ is bounded by 

II ztMI ^ I "max . illrllllrrll /oc\ 

\\E\\ < e ( c 5 - gi + c 6 ng 2 J \\L\\\\U\\ , (35) 

\ "min / 

where the Frobenius norm is used, 6 max and 6 m i n are the maximum and minimum moduli of 
the elements of B, C5 and cq are small constants, and g\ and g 2 ar e generator growth factors, 
defined by 

9 1 = c 7]i7Tt + c 8-fT77T| + c 9l|diag{4fc}||, (36) 

II II II ^ II 

g 2 = max {\R k \\/\\R k \\} , (37) 

k=2,...,n 
With Cj, Cg, Cg < 1. 

Proof From step 5 of the above methodology, we essentially invert the Sylvester equation 
(30) to derive an expression for E. To do this we apply (18) in Corollary 3.2. This can be 
written in matrix form 

E = BoG 



s< * 



|E| = |B|o|G| < Cl ^A( 5 )o|L||£/| + c 2 ^|| L ||[/|oA( 6 ) + 



•'nun 



7 n 

C3 _E^A( 7 ) o |L|diag{ Ufefc }|t/| + c 4 A( 8 ) o £ \R' k \ . (38) 

Omin k=2 

We now define g 2 = max fc=2 ,...,„ ||.R( fc )||/||Jtf fc )||, g 4 = ||£||/||£||, 95 = \\U\\/\\U\\ and 

(k) 

g§ = ||diag{ujj. fc }||. These can be considered to be generator growth factors — they are functions 
of the V"W, which from the definition (29) are the ratio of the products of the magnitudes of 
the generators to the products of the generators. We will see in §5 that these growth factors 
can sometimes be large. 

Taking the Frobenius norm of (38), we can easily show that 

II eh II ^ r "max n r II II rr II i £ "max 11 r n m 7-7-11 , c "max Mrllllrrll 1 r llrllllrrll /on\ 

\\E\\ <C\di- #3 ||L|| ||t/||+C202ff4T ll^llll^ll+ c 3C ) 3T 9h\\L\\ \\U\\ + CqUO^ \\L\\ \\U\\ . (39) 

"min "min "min 

where < \8\\, . . . , |&4| < e. The result follows by collecting the first three terms of (39). D 

The following corollary specializes the above result to the case when R is derived from a 
Toeplitz matrix. 

Corollary 3.7 Let R be derived from a Toeplitz matrix T by the transformation (10) in 
Theorem 2.2, and let c\, c 2 , g\, g 2 and E be as defined in Theorem 3.6. Then ||Z£|| is bounded 
by 

p7||<eci O 03n||L||||l7||, (40) 

where c\q = max(2c5/7r,C6) and g% = m&x(gi,g 2 ). 

Proof. Recall that B = [1/(j^— Sj)] is the ordinary Cauchy matrix with displacement operator 
V{D si D t ); from equations (11) in Theorem 2.2, the ti are n equally-spaced points around the 
unit circle, including one at (1,0), and the Sj are also n equally-spaced points around the unit 
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circle, with each Sj between two of the £j. Clearly ir/n < t{ — Sj < 2 Vi, j, so by the definition 
of B, 

^ < 2n/7r . (41) 

Omin 

Substituting (41) in (39), bounding 2cs/7r and C6 by cio, and bounding #i and 52 by 33 yields 
the result. D 

The above results show that the expressions for the backward error bounds from the GKO 
algorithm are similar to the ones for Gaussian elimination with partial pivoting (GE/PP) [9], 
except for the generator growth factors which might arise in particular cases where the $' ' and 
\pw are large, but not the <I>W\j/( fe ) or the R^. So there may be some cases where large error 
growth may occur in the GKO algorithm but not GE/PP. In §5, we give an example where this 
occurs. 

4 Error Analysis of the GKO-Toeplitz Algorithm 

Recall that the steps in the GKO-Toeplitz algorithm are (i) compute the generators from the 
Toeplitz matrix T using (4) and (5), (ii) convert them to generators of a Cauchy matrix using 
(12) and (iii) compute factors L and U of this Cauchy matrix using the GKO algorithm. The 
factors of T are then given by (13). There are errors incurred at each of these steps. In this 
section, we do not consider permutations, as these do not contribute to the error. We will derive 
a bound for the perturbation matrix Et, defined by 

F*LUFD = T + E T . 

In our development, we show in Theorem 4.1 that Et consists of two components — the first due 
to the error ||.E|| incurred in the Cauchy factorization and the second due to the errors incurred 
in computing the Cauchy generators <!> and ^f. The latter is a Toeplitz- type perturbation AT 
such that T + AT transforms exactly to $ and ^f. We then derive two lemmas needed to derive 
AT, and then present the main result of this section in Theorem 4.4. 

4.1 Main components of E T 

Et has two main components, as is shown in the following. 

Theorem 4.1 Let F and D be as in Theorem 2.2, let <I> and \I/ be the Cauchy generators 
computed using (4), (5) and (12), and let L and U be the factors computed from <I> and ^ using 
the GKO algorithm. Then the perturbed factorization of T satisfies 

F*LUFD = T + E T = T - F*EFD + AT , (42) 

where E is as in Theorem 3.6 and AT is a Toeplitz-type perturbation of T such that T + AT 
has generators Q, and T that transform exactly to $ and ^ using (4), (5) and (12). 

Proof. Let R be the Cauchy matrix generated by $ and ^. We have 

R = LU + E , 

and we know from (10) that $ and \& are the generators for 

R = F(T + &T)D- l F* 

where T + AT is some Toeplitz-type matrix. From the above two equations we obtain 

T + AT = F*RFD = F* {LU + E)FD , 

from which the desired result follows. D 
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Thus, by (42), we see that Et has one component with the same norm bound as E, and 
another which perturbs T to a matrix such that its generators, say 0, and T, transform exactly to 
$ and \L'. Before we derive an expression for AT, we need two preliminary results : expressions 
for S7 and T, and a method to recover T + AT from its generators f2 and T. 

4.2 Estimation of AT — preliminary results 

The required results are given in the following two lemmas. 

Lemma 4.2 Let £1 and T be as in (4) and (5), and let f2 and T transform exactly to <L> and \L r 
using (12). Let [a, b] = f2 — f2 and let [c, d] = T* — T* . Then 

a = 

and \\h\\, ||c|| and ||d|| are bounded by 

||b|| < efcin 3/2 ||cj :2 || , (43) 

||c|| < efc 2 n 3 / 2 || 7l: || , (44) 

||d|| < e . (45) 

Proof. We first consider the errors incurred in the computation of <L> and fy. We have 

l> = /Z{F[ei,w :2 ]}, where F = fl(F) ,u :2 = fl(u :2 ) 
= [l,fl(Fu> :2 )], where 1 = [1, 1, . . . , 1] T 

= [l,Fu.. 2 + k 3 n\\u.. 2 \\8 m ] 

where \5\ | < e , i = 1, . . . , n. After a few more steps, this becomes 

l> = T[ei,cj :2 + b] 

where b = A^-'cj^ + k±(n + l)||cj ; 2||T*<r '. Ina similar way, it can be shown that 

V* = FD[y* 1: +c,e n + d} 

where c = k 5 D* A® D~fl- + h(n + 1)||7 1: ||<5 (2) and d = D*F*d n A^i^.. Now the expressions in 
square brackets transform exactly to f2 and T respectively, and by taking norms of b, c and d 
the bounds (43) to (45) can be demonstrated in a few steps. D 

Lemma 4.3 For any matrix A, let X7tZi,Z-i}A = B. Then A can be recovered from B using 

n j-1 

a ij = /^i "l+(«+fe-j) mod n,k ~ £^i ^l+(i+k—j) mod n,k ■ (46) 

k=j fc=l 

Proof. From the displacement operator V/^ z_ 1 } ) the following properties of B are easily 
seen: 

bij = ai-ij - dij-i, 1 <i <n ,1 < j <n , (47) 

hj = a nj - a i;j+1 , 1 < j < n , (48) 

bin = a i-i,j + a «i ; 1 < i < n and (49) 

bin = a n ,n-i + on • (50) 

Lt can be verified that if the elements of A are given by (46), then (47) to (50) are satisfied. D 
Equation (46) shows that an element a^ is recovered by computing x — y, where x is the 
sum of elements of B down the diagonal, commencing from 6j+ij and proceeding to the last 
column, wrapping from the last row to the first if necessary during the summing; y is a similar 
"wrapped diagonal sum" from the first column to frjj-i- 
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4.3 Main result 

We now use Theorem 4.1, Lemma 4.2 and Lemma 4.3 to derive a bound for the backward error 
||£r|| in the GKO-Toeplitz algorithm. 

Theorem 4.4 Let F and D be as in Theorem 2.2, and let L and U be the factors computed 
from T using the GKO- Toeplitz algorithm. Then the perturbed factorization of T satisfies 

F*LUFD = T + E T = T + EW + £ (2) , (51) 

where E^ l > is a general matrix with norm \\E^-'\\ = \\E\\, E is as in Theorem 3.6, and E^ 2 > is a 
Toeplitz-type matrix with norm bounded by 

P (2) ||<e Cll n 2 (||t 1: || + ||t :1 ||). (52) 

Proof. By comparing (51) and (42), we see that E^> = —F*EFD, and because F and D 
are orthogonal matrices, 

\\E {1) \\ = \\E\\. (53) 

From the above comparison we also have E^ 2 ' = AT, a Toeplitz-type perturbation of T such 
that T + AT has generators f2 and T that transform exactly to the Cauchy generators $ and 4? 
computed using (4). In the following, we use E^ 2 > for AT. From Lemma 4.2, we have 



V(T + £ (2) ) = QF = nr + e x c* + u :2 d* + be 



T 

n > 



where b, c and d are bounded as in (43) to (45). The second-order error term bd* has been 
omitted. We then have 

V£ (2) = eic* + cj :2 d* + be£ , 

and we use (46) to compute E^ 2 >. This yields, after some algebra 

le^l = Cj-idc^l + |b fl |) + |p :j -| 

where Ck is a matrix which by premultiplication, circularly upshifts a vector k places, x 
indicates the reversal of x, and the moduli of p :J - are bounded by 

\Pij\ < |o; : 2| T Cj_i_i|d| 

< ||cj :2 ||||d|| . (54) 

Using (43), (44), (54) and (45) it is easily seen that 

He^H^cian^dlwall + hi:!!). 

From this, using the definitions (4) and (5), we obtain the bound (52) for E^ 2 >. Together with 
(53), this yields the result. D 

5 Discussion of Error Bounds 

We first discuss the factors in the above error bounds and relate them to what would be expected 
for Gaussian elimination with partial pivoting (GE/PP). Then we show, for both the Cauchy 
and Toeplitz variants, that there are some cases where the backward error growth can be large. 
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5.1 Relation of bounds to those for GE/PP 

Consider the backward error E incurred by the Cauchy variant (equation (35)). The term 
||L||||C/|| is similar to that obtained for GE/PP [9]. However, the first factor contains the gen- 
erator growth factors g\ and Q2 ■ These are given by ratios of norms of the hatted quantities to 
the unhatted quantities in (36) and (37). The former are derived from the latter by element- 
wise multiplication by submatrices of the V^ k \ which from their definitions (29) are the ratio 
of the products of the magnitudes of the generators to the products of the generators. For an 
ordinary Cauchy matrix, v\a = 1 Vi,j,k because <&' ' and Vl/( ' have only one column and row 
respectively. However, for higher displacement-rank Cauchy matrices, there may be significant 
cancellation in the computation of the denominator of (29), so they may be significant growth 
in the size of the L, U and Rk compared to the L, U and Rk respectively. 

The backward error Et incurred by the Toeplitz variant has two components — one with 
the same norm as E above, and a Toeplitz-type component with norm bounded as in (52). The 
latter bound is proportional to n 2 and contains no growth factors, so it would be expected that 
the bound would be dominated by the first component. 

We next give examples where the generator growth might be expected to be large in the 
Cauchy and Toeplitz variants. 

5.2 Examples of large generator growth 

Cauchy case. Here, we can select an example where all the elements of V = V^ 1 ' are large. 
This will occur when significant cancellation occurs in the computation of the (piipj. Such an 
example is 

$ = [a, a + f] , * = [a, -a] T , 

where |ja|| is of order unity, and ||f|| is very small. Then &*$> = —fa, that is, all the elements 
of $^ are very small compared to those of |$||^|. Moreover, because a and f can be arbitrary 
except for their norms, the original matrix [(£j — Sj) -1 ^^] is in general well-conditioned. 

Toeplitz case. The Toeplitz case has an extra constraint on the selection of $ and *&, since 
it must be generated from il and T using the transformations (12). Because of this constraint, 
there is no case where all the elements of V can be made large. However, all of the first column 
of V can be made large, and this will cause error growth, in spite of the pivoting. This will 
happen in the following case. 

Recall that Oj_j = Uj Vi,j. Select 

a = 1 (55) 

and Oj = —ai- n , 1 < i < n — 1 , (56) 

so that f2 = [ei,ei]. Then all of the first column of V will be large if ipu + ip±2 is very small 
compared to ipu and ^12- It can be verified from (12) that if we select a±, . . . , a„_i to satisfy 

n-l 

} j a n -j exp(iir(j — l)/n) = — exp(nr(n — l)/n + 8/2 (57) 

3=1 

then ipu + ipi2 = 8. There is a wide variety of choices for the aj. Let n be even, and set 

On-j = (58) 

except for a n /2-\ and a n _i- Then (57) is satisfied when 

On/2-1 = -sin(7r/n) + ^s{8/2) , a n _i = cos(7r/n) + $1(5/2) . (59) 
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So if 5 is small, and the a,j are selected according to (55), (58), (59) and (56), all of the first 
column of V will be large, with magnitude 0(1/5). 

Numerical examples. Order-8 Toeplitz matrices were generated according to (55), (58), (59) 
and (56), with 5 = 10 , k = 2, . . . , 16. For each matrix, the system Tx = 1 was solved. It 
was found that the normalized solution error ||x — x||/||x|| grew as the square of 1/5, and the 
normalized residual ||Tx — l||/j|b|| grew linearly with 1/5. Thus the algorithm is only weakly 
stable in this case. 

6 Modified GKO Algorithm 

The problem with the original pivoting strategy is that when all elements of r ; i are small and 
all elements of v : i are large, normal partial pivoting will not stabilize the algorithm. Complete 
pivoting will do so, but requires 0(n 2 ) operations to find the pivot at each major step and 0(n 3 ) 
operations overall. However, a strategy of using the largest element in the first row and column 
should stabilize the algorithm in most cases, and we see that it does in the above cases. 

To incorporate this row- 1 /column- 1 pivoting, it is easy to see that the following steps should 
be added to the GKO algorithm (Algorithm 2.2): 

• Step 1: add substep P' <— /, where P' will be the matrix of column interchanges. 

• After loop to recover column 1 of Rk '■ add loop to recover row 1 of Rk- 

• After loop to find maximum maxi in column 1 : add loop to find maximum max2 in 
row 1. If maxi > max2, carry out row interchanges as in Algorithm 2.2. Otherwise carry 
out column interchanges by swapping the appropriate elements in s, ty( k ' and rL , and 
the appropriate columns in U and P' . 

• After computation of L, U, P and P', the factors of R are P LUP . 

Results. When the modified algorithm was used on the same set of systems as was considered 
in the previous section, it was found that the normalized solution error ||x — x||/||x|| grew 
linearly with 1/5 and the condition number of T, and the normalized residual ||Tx — l||/||b|| was 
approximately constant at about 4 x 10 -15 , a small multiple of e. Thus the modified algorithm 
is stable in this case. 

7 Conclusions 

It has been shown that bound for the backward error in the GKO algorithm is similar to that 
for partial pivoting, except that extra factors, the generator growth factors, are included. These 
factors can be large when there is sufficient cancellation in the computation of the generators. 
Examples of this have been presented, and it was demonstrated that the original GKO algorithm 
was only weakly stable in these cases. A modified version which uses row 1/column 1 pivoting 
was then presented; this version was stable in these cases. 

It is not known whether there are any cases upon which the modified algorithm will give 
large errors. Further work needs to be done to ascertain this, and if such cases can be found, 
the pivot strategy needs to be improved further. The aim is to find the maximum in R, or an 
element close to the maximum, still in 0{n) operations. An extension of the above strategy may 
be to have a few iterations in the search, i.e. search for the row- 1 /column- 1 maximum, say at 
r± p , then search along column p for the maximum there, and so on. This may find a better pivot 
at the expense of some extra work. 

A practical strategy is to use the modified algorithm of §6 followed by a check of the residual; 
in the unlikely event that the residual is large we can resort to a stable 0(n 3 ) algorithm. 
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